Introduction to Reservoir Computing with ReservoirPy¶
Nathan Trouvain
Inria, IMN, LaBRI - Bordeaux, France
UCLA - November 14th 2023
Key oncepts and features ¶
- Numpy, Scipy, and that's it!
- Efficient execution (distributed implementation)
- Online and offline learning rules
- Complex model architectures enabled
- Documentation: https://reservoirpy.readthedocs.io/en/latest/
- GitHub: https://github.com/reservoirpy/reservoirpy
General info¶
- Everything is NumPy (and more generally "standard" scientific Python)
- First axis of arrays is always representing time.
Timeseries prediction ¶
The Lorenz attractor
$$ \begin{split} \dot{x}(t) &= \sigma (y(t) - x(t)) \\ \dot{y}(t) &= \rho x(t) - y(t) - x(t)z(t) \\ \dot{z}(t) &= x(t)y(t) - \beta z(t) \end{split} $$
- describes convection movements in a fluid. Highly chaotic!
from reservoirpy.datasets import lorenz
timesteps = 3000
X = lorenz(timesteps, x0=[17.67, 12.93, 43.91])
plot_lorenz(X, 1000)
Knowing series value at timestep $t$:
- How can we predict $t+1$, $t+100$...?
- How can we predict $t+1$, $t+2$, $\dots$, $t+n$ ?
10-steps ahead forecasting¶
Predict $P(t + 10)$ knowing $P(t)$.
from reservoirpy.datasets import to_forecasting
x, y = to_forecasting(X, forecast=10)
X_train1, y_train1 = x[:2000], y[:2000]
X_test1, y_test1 = x[2000:], y[2000:]
plot_train_test(X_train1, y_train1, X_test1, y_test1)
Some reservoir reminder¶
$$ x[t+1]= (1 - \alpha) x[t] + \alpha f(W \cdot x[t] + W_{in} \cdot u[t] + W_{fb} \cdot y[t]) $$ $$ y[t]= W_{out}^{\intercal} x[t] $$
ESN preparation¶
units = 100 # - number of units
leak_rate = 0.3 # - leaking rate
spectral_radius = 0.95 # - spectral radius
input_scaling = 0.5 # - input scaling (also called input gain)
connectivity = 0.1 # - recurrent weights connectivity probability
input_connectivity = 0.2 # - input weights connectivity probability
regularization = 1e-4 # - L2 regularization coeficient
transient = 100 # - number of warmup steps
seed = 1234 # - use for reproducibility
from reservoirpy.nodes import Reservoir, Ridge
reservoir = Reservoir(units, input_scaling=input_scaling, sr=spectral_radius,
lr=leak_rate, rc_connectivity=connectivity,
input_connectivity=input_connectivity, seed=seed)
readout = Ridge(ridge=regularization)
esn = reservoir >> readout
reservoir_fb = reservoir << readout
esn_fb = reservoir_fb >> readout
ESN training¶
Learning is performed offline: model is updated only once, on all available training data.
esn.fit(X_train1, y_train1, warmup=transient);
plot_readout(readout)
ESN forecast¶
y_pred1 = esn.run(X_test1)
plot_results(y_pred1, y_test1)
Coefficient de détermination $R^2$ et erreur quadratique normalisée :
rsquare(y_test1, y_pred1), nrmse(y_test1, y_pred1)
(0.9966686152092658, 0.011448035696843583)
Closed-loop reservoir¶
- Train the ESN on solving a 1-step ahead prediction (learn the flow function $f(x_t) = x_{t+1})$
- Use ESN to predict on its own activity ("generative" mode).
units = 300 # - number of units
leak_rate = 0.3 # - leaking rate
spectral_radius = 1.25 # - spectral radius
input_scaling = 0.1 # - input scaling (also called input gain)
connectivity = 0.1 # - recurrent weights connectivity probability
input_connectivity = 0.2 # - input weights connectivity probability
regularization = 1e-4 # - L2 regularization coeficient
transient = 100 # - number of warmup steps
seed = 1234 # - use for reproducibility
Forecast of close future¶
esn = reset_esn()
x, y = to_forecasting(X, forecast=1)
X_train3, y_train3 = x[:2000], y[:2000]
X_test3, y_test3 = x[2000:], y[2000:]
esn = esn.fit(X_train3, y_train3, warmup=transient)
Closed-loop model¶
- 100 timesteps used as warmup;
- 300 timesteps created from reservoir dynamics, without external inputs.
seed_timesteps = 100
warming_inputs = X_test3[:seed_timesteps]
warming_out = esn.run(warming_inputs) # échauffement
nb_generations = 500
X_gen = np.zeros((nb_generations, 3))
y = warming_out[-1]
for t in range(nb_generations): # génération
y = esn(y)
X_gen[t, :] = y
X_t = X_test3[seed_timesteps: nb_generations+seed_timesteps]
plot_generation(X_gen, X_t, warming_out=warming_out, warming_inputs=warming_inputs)
plot_attractors(X_gen, X_t, warming_inputs, warming_out)
Online learning ¶
Online learning happens anytime and incrementally.
In the following, we will use Recursive Least Squares algorithm.
from reservoirpy.nodes import RLS
reservoir = Reservoir(units, input_scaling=input_scaling, sr=spectral_radius,
lr=leak_rate, rc_connectivity=connectivity,
input_connectivity=input_connectivity, seed=seed)
readout = RLS() # Recursive Least Squares
esn_online = reservoir >> readout
Step-by-step training¶
outputs_pre = np.zeros(X_train1.shape)
for t, (x, y) in enumerate(zip(X_train1, y_train1)): # for each timestep do :
prediction = esn_online.train(np.atleast_2d(x), np.atleast_2d(y))
outputs_pre[t, :] = prediction
plot_results(outputs_pre, y_train1, sample=100)
plot_results(outputs_pre, y_train1, sample=500)
Whole sequence training¶
esn_online.train(X_train1, y_train1)
pred_online = esn_online.run(X_test1) # Wout is now learned and fixed
plot_results(pred_online, y_test1, sample=500)
Determination coefficient $R^2$ and NRMSE:
rsquare(y_test1, pred_online), nrmse(y_test1, pred_online)
(0.9954163172865621, 0.013428448857951327)
Diving in the reservoir¶
units = 300 # - number of units
leak_rate = 0.3 # - leaking rate
spectral_radius = 1.25 # - spectral radius
input_scaling = 0.1 # - input scaling (also called input gain)
connectivity = 0.1 # - recurrent weights connectivity probability
input_connectivity = 0.2 # - input weights connectivity probability
regularization = 1e-4 # - L2 regularization coeficient
transient = 100 # - number of warmup steps
seed = 1234 # - use for reproducibility
1. The spectral radius¶
The spectral radius is the recurrent weights matrix ($W$) largest absolute eigenvalue.
states = []
radii = [0.1, 1.25, 10.0]
for sr in radii:
reservoir = Reservoir(units, sr=sr, input_scaling=0.001, lr=leak_rate, rc_connectivity=connectivity,
input_connectivity=input_connectivity)
s = reservoir.run(X_test1[:500])
states.append(s)
units_nb = 20
plt.figure(figsize=(15, 8))
for i, s in enumerate(states):
plt.subplot(len(radii)*100+10+i+1)
plt.plot(s[:, :units_nb], alpha=0.6)
plt.ylabel(f"$sr={radii[i]}$")
plt.xlabel(f"Activations ({units_nb} neurons)")
plt.show()
$-$ spectral radius $\rightarrow$ stable dynamics
$+$ spectral radius $\rightarrow$ chaotic dynamics
2. The input scaling¶
It is a coefficient applied to $W_{in}$. It can be seen as a gain applied on inputs.
states = []
scalings = [0.00001, 0.001, 2.0]
for iss in scalings:
reservoir = Reservoir(units, sr=spectral_radius, input_scaling=iss, lr=leak_rate,
rc_connectivity=connectivity, input_connectivity=input_connectivity)
s = reservoir.run(X_test1[:500])
states.append(s)
units_nb = 20
plt.figure(figsize=(15, 8))
for i, s in enumerate(states):
plt.subplot(len(scalings)*100+10+i+1)
plt.plot(s[:, :units_nb], alpha=0.6)
plt.ylabel(f"$iss={scalings[i]}$")
plt.xlabel(f"Activations ({units_nb} neurons)")
plt.show()
Average correlation of reservoir states and inputs :
- $+$ input scaling $\rightarrow$ activity is bounded to input dynamics
- $-$ input scaling $\rightarrow$ activity is freely evolving
Input scaling may be used to adjust relative importance of different inputs.
3. The leaking rate¶
$$ x(t+1) = \underbrace{\color{red}{(1 - \alpha)} x(t)}_{\text{current state}} + \underbrace{\color{red}\alpha f(u(t+1), x(t))}_{\text{new inputs}} $$
with $\alpha \in [0, 1]$ and:
$$ f(u, x) = \tanh(W_{in} \cdotp u + W \cdotp x) $$
states = []
rates = [0.02, 0.2, 0.9]
for lr in rates:
reservoir = Reservoir(units, sr=spectral_radius, input_scaling=input_scaling, lr=lr,
rc_connectivity=connectivity, input_connectivity=input_connectivity)
s = reservoir.run(X_test1[:500])
states.append(s)
units_nb = 20
plt.figure(figsize=(15, 8))
for i, s in enumerate(states):
plt.subplot(len(rates)*100+10+i+1)
plt.plot(s[:, :units_nb] + 2*i)
plt.ylabel(f"$lr={rates[i]}$")
plt.xlabel(f"States ({units_nb} neurons)")
plt.show()
- $+$ leaking rate $\rightarrow$ low inertia, short activity timescale
- $-$ leaking rate $\rightarrow$ strong inertia, strong activity timescale
The leaking rate is a proxy of the inverse of the reservoir neurons time constant.
Use case: falling robot ¶
features = ['com_x', 'com_y', 'com_z', 'trunk_pitch', 'trunk_roll', 'left_x', 'left_y',
'right_x', 'right_y', 'left_ankle_pitch', 'left_ankle_roll', 'left_hip_pitch',
'left_hip_roll', 'left_hip_yaw', 'left_knee', 'right_ankle_pitch',
'right_ankle_roll', 'right_hip_pitch', 'right_hip_roll',
'right_hip_yaw', 'right_knee']
prediction = ['fallen']
force = ['force_orientation', 'force_magnitude']
plot_robot(Y, Y_target, F)
ESN training¶
Using ESN class, an optimized and distributed implementation of Echo State Network.
from reservoirpy.nodes import ESN
reservoir = Reservoir(300, lr=0.5, sr=0.99, input_bias=False)
readout = Ridge(ridge=1e-3)
esn = ESN(reservoir=reservoir, readout=readout, workers=-1) # parallel computations: on
esn = esn.fit(X_train, y_train)
res = esn.run(X_test)
plot_robot_results(y_test, res)
print("Mean RMSE:", f"{np.mean(scores):.4f}", "±", f"{np.std(scores):.5f}")
print("Mean RMSE (with threshold):", f"{np.mean(filt_scores):.4f}", "±", f"{np.std(filt_scores):.5f}")
Mean RMSE: 0.1571 ± 0.09481 Mean RMSE (with threshold): 0.1339 ± 0.14254
acc = 0.0
for y_pred, y_true in zip(res, y_test):
true_fall = 1 if np.max(y_true) > 0.8 else 0
pred_fall = 1 if np.max(y_pred) > 0.8 else 0
acc += true_fall == pred_fall
print("Accuracy: ", acc / len(y_test))
Accuracy: 0.9806094182825484
Use case: anytime decoding of canary songs¶
Dataset can be found on Zenodo: https://zenodo.org/record/4736597
Decoded song units: phrases, which are repetitions of identical syllables.
- One label per phrase/syllable type, with phrase onset and offset time.
- One SIL label used to denote silence.
im = plt.imread("./static/canary_outputs.png")
plt.figure(figsize=(15, 15)); plt.imshow(im); plt.axis('off'); plt.show()
ESN training¶
esn = esn.fit(X_train, y_train)
outputs = esn.run(X_test)
scores # for each song in the training set
[0.9494949494949495, 0.909044193216855, 0.9531759739013625, 0.9407697626181147, 0.9638157894736842, 0.9478260869565217, 0.892835458409229, 0.9403111739745403, 0.9212787899621864, 0.8762235554152195]
print("Précision moyenne :", f"{np.mean(scores):.4f}", "±", f"{np.std(scores):.5f}")
Précision moyenne : 0.9295 ± 0.02718
Thank you! It is now yours to try¶
Nathan Trouvain
Inria, IMN, LaBRI - Bordeaux, France
UCLA - November 14th 2023